1. Perparing Data

Totally we have totally 13 indices reflecting US economic market performance in this study. Those indices are listed in the code chunk below, in which you will see the name of each index and what it represent for.

We should notice that the frequency of some series vary from each other. To solve the problem, the frequency of all indices used in this study was chosen as quarterly. Besides, to solve the inconsistency of time length, i.e. different starting time of all indices, I used the subset of series which starts in 1960 Q1 and ends in 2019 Q4, since this series has the largest common length.

All index series are standardized before we do any statistical analysis, that is, centering each series (makes its mean equal to 0) and multiplying a constant to each series to make its variances equal to 1. For example, the value scale of Private Housing Units Permits Total SAAR (thousands) is much greater than else. By standardized the data series, we may ignore how magnificent of its value but the relationship between it and other series. Taking advantages of those internal relationship, we can group economic time periods into different regimes and learn how to make strategies for different economic regimes.

#remove NA
dat.all = na.omit(dat) -> df
dygraph(dat.all, main = "Original Data")%>% dyRangeSelector()
#standardized
dygraph(scale(dat.all), main = "Standardized Data")%>% dyRangeSelector()
indices_des
##       indices         
##  [1,] "EHGDUS Index"  
##  [2,] "CPI YOY Index" 
##  [3,] "CPI CHNG Index"
##  [4,] "EHUPUS Index"  
##  [5,] "IP CHNG Index" 
##  [6,] "NHSPATOT Index"
##  [7,] "NFP TCH Index" 
##  [8,] "TMNOCHNG Index"
##  [9,] "LEI TOTL Index"
## [10,] "PITL YOY Index"
## [11,] "CICRTOT Index" 
## [12,] "USCABAL Index" 
## [13,] "M2% YOY Index" 
##       des                                                                          
##  [1,] "US Real GDP (QoQ, %, SAAR)"                                                 
##  [2,] "US CPI (inflation) Urban Consumer YoY NSA"                                  
##  [3,] "US CPI (inflation) Urban Consumer MoM SA"                                   
##  [4,] "US Unemployment Rate (%)"                                                   
##  [5,] "US Industrial Production MoM SA"                                            
##  [6,] "Private Housing Units Permits Total SAAR (thousands)"                       
##  [7,] "US Employment on Nonfarm Payrolls Total (SA, Net Monthly Change, thousands)"
##  [8,] "US Manufacturing New Orders Total MoM SA"                                   
##  [9,] "Conference Board US Leading Economic Indicator"                             
## [10,] "US Personal Income YoY SA"                                                  
## [11,] "Federal Resrve Consumer Credit Total Net Change SA"                         
## [12,] "US Nominal Account Balance (Billions USD)"                                  
## [13,] "Federal Reserve Money Supply M2 YoY % Change"

Data Correlation

Through looking into the correlation between each pairs of standardized indices, we can detect that some of indices are quiet closely related to each other. Since we already have 13 indices Therefore, we can use some dimension reduction techniques in statistics like PCA.

df = scale(df)
cor.matrix = cor(as.data.frame(df))
cor.pairs = which(cor.matrix>0.4&cor.matrix!=1, arr.ind=TRUE)
cor.pairs = unique(t(apply(cor.pairs, 1, sort)))
colnames(cor.pairs) <- c("row", "col")
for (i in 1:nrow(cor.pairs)) {
  p = c(cor.pairs[i, "row"], cor.pairs[i, "col"])
  x = plot(df[,p], main = paste(colnames(cor.matrix)[p], collapse=" vs. "),
           legend.loc = "topleft",)
  #addLegend("topleft", colnames(cor.matrix)[p] , col=1:2, lty=1,lwd=2)
  print(x)
  #cat("The correlation between", indices_des[p[1],"des"], "and", indices_des[p[2],"des"], "is", 
  #    round(cor.matrix[p[1],p[2]],3),".\n")
  cat("The correlation between", paste(colnames(cor.matrix)[p], collapse=" and "), "is", 
      round(cor.matrix[p[1],p[2]],3),".\n")
}

## The correlation between EHGDUS Index and IP CHNG Index is 0.44 .

## The correlation between EHGDUS Index and NFP TCH Index is 0.516 .

## The correlation between CPI YOY Index and CPI CHNG Index is 0.719 .

## The correlation between CPI YOY Index and PITL YOY Index is 0.709 .

## The correlation between LEI TOTL Index and CICRTOT Index is 0.662 .

## The correlation between CPI CHNG Index and PITL YOY Index is 0.516 .

## The correlation between PITL YOY Index and USCABAL Index is 0.495 .

## The correlation between IP CHNG Index and NFP TCH Index is 0.681 .

## The correlation between IP CHNG Index and TMNOCHNG Index is 0.501 .

## The correlation between NFP TCH Index and TMNOCHNG Index is 0.428 .

Classification

Principle Component Analaysis (PCA)

Monthly data is too repetitious for PCA, as we aim to looking into different historic economic regimes rather than a specific month.

fviz_contrib(PCdf, choice = "var", axes = 1)

fviz_contrib(PCdf, choice = "var", axes = 1:2)

Variable contributions to PC

The plot below shows the contributions of variables in accounting for the variability to the top 2 principal components, that is, the higher contribution (%) of one economic index in this graph, the greater necessity to including this index into our analysis.

Graph of individuals

Individuals with a similar profile are grouped together.

Graph of variables

Positive correlated variables point to the same side of the plot. Negative correlated variables point to opposite sides of the graph.

Biplot of both individuals and variables

K-means

K-means clustering (MacQueen 1967) is one of the most commonly used unsupervised machine learning algorithm for partitioning a given data set into a set of k groups (i.e. k clusters), where k represents the number of groups pre-specified by the analyst. It classifies objects in multiple groups (i.e., clusters), such that objects within the same cluster are as similar as possible (i.e., high intra-class similarity), whereas objects from different clusters are as dissimilar as possible (i.e., low inter-class similarity).

# Using the results of variable contribution by PCA, I removed two least important indices from the original data.
df2 = as.data.frame(df.yearly)[,c(1:3,5,7:10,12:13)]
# The plot above represents the variance within the clusters. It decreases as k increases, but it can be seen a bend (or “elbow”) at k = 4. 
fviz_nbclust(df2,kmeans,method = "wss")+
  geom_vline(xintercept = 5, linetype = 2)

k2 <- kmeans(df2, centers = 5, nstart = 15)
fviz_cluster(k2, data = df2, palette = "Set2", ggtheme = theme_minimal())

Conclusion

By the means of PCA and k-means clustering, we can classify the historical years from 1967 to 2019 into 4 different economic regimes. The first regime is compose of 2008, 2009 (global financial crisis), 2001 (Bursting of dot-com bubble – speculations concerning internet companies crashed), and so on. Therefore, this regime of economic scenarios represent financial crisis. Second Regime is pointed by the Private Housing Units Started Total Monthly Change Index, both 1997-1999 and 2012-2014 are located in this regime. This regime indicates economic recovery. Third regime which is indicated by US Initial jobless claims index and two CPI index, represented by 1974 (1973–1974 stock market crash), 1979-1981(Early 1980s recession). Fourth regime is dominated by the high value of GDP, Employment Payroll, Industrial Production. This regime is represented by 1976-1978, a period of economic expansion.